TMA: Tera‐MACs/W neural hardware inference accelerator with a multiplier‐less massive parallel processor

نویسندگان

چکیده

Computationally intensive inference tasks of deep neural networks have brought about a revolution in accelerator architecture, aiming to reduce power consumption as well latency. The key figure-of-merit hardware accelerators is the number multiply-and-accumulation operations per watt (MACs/W); state-of- the-art MACs/W, so far, has been several hundreds Giga-MACs/W. We propose Tera- MACS/W (TMA) with 8-bit activations and scalable integer weights less than 1-byte. architecture's main feature configurable processing element for matrix-vector operations. proposed uses multiplier-less massive parallel processor that works without multipliers, which makes it attractive energy efficient high-performance network applications. benchmark our system's latency, power, performance using Alexnet trained on ImageNet. Finally, we compared accelerator's throughput prior works. outperforms state-of-the-art counterparts, terms area efficiency, achieving 2.3 TMACs/[email protected] V 28-nm Virtex-7 FPGA chip.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Low Power Hardware Accelerator for Deep Neural Networks

In this project, we take a first step towards building a low power hardware accelerator for deep learning. We focus on RBM based pretraing of deep neural networks and show that there is significant robustness to random errors in the pre-training, training and testing phase of using such neural networks. We propose to leverage such robustness to build accelerators using low power but possibly un...

متن کامل

Nn-X - a hardware accelerator for convolutional neural networks

Gokhale, Vinayak A. M.S.E.C.E, Purdue University, August 2014. nn-X A Hardware Accelerator for Convolutional Neural Networks. Major Professor: Eugenio Culurciello. Convolutional neural networks (ConvNets) are hierarchical models of the mammalian visual cortex. These models have been increasingly used in computer vision to perform object recognition and full scene understanding. ConvNets consist...

متن کامل

A Hardware Implementation of a Binary Neural Image Processor

This paper presents the work that has resulted in the SAT processor; a dedicated hardware implementation of a binary neural image processor. The SAT processor is aimed speciically at supporting the ADAM algorithm and is currently being integrated into a new version of the C-NNAP parallel image processor. The SAT processor performs binary matrix multiplications, a task that is computation-ally c...

متن کامل

Artificial Neural Networks Processor - A Hardware Implementation Using a FPGA

Several implementations of Artificial Neural Networks have been reported in scientific papers. Nevertheless, these implementations do not allow the direct use of off-line trained networks because of the much lower precision when compared with the software solutions where they are prepared or modifications in the activation function. In the present work a hardware solution called Artificial Neur...

متن کامل

New Hardware for Massive Neural Networks

Transient phenomena associated with forward biased silicon p +-n-n + structures at 4.2K show remarkable similarities with biological neurons. The devices play a role similar to the two-terminal switching elements in Hodgkin-Huxley equivalent circuit diagrams. The devices provide simpler and more realistic neuron emulation than transistors or op-amps. They have such low power and current require...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Circuit Theory and Applications

سال: 2021

ISSN: ['0098-9886', '1097-007X']

DOI: https://doi.org/10.1002/cta.2917